Developing, characterizing and modeling CRISPR-based point-of-use pathogen diagnostics

Recent years have seen intense interest in the development of point-of-care nucleic acid diagnostic technologies to address the scaling limitations of laboratory-based approaches. Chief among these are combinations of isothermal amplification approaches with CRISPR-based detection and readouts of target products. Here, we contribute to the growing body of rapid, programmable point-of-care pathogen tests by developing and optimizing a one-pot NASBA-Cas13a nucleic acid detection assay. This test uses the isothermal amplification technique NASBA to amplify target viral nucleic acids, followed by Cas13a-based detection of amplified sequences. We first demonstrate an in-house formulation of NASBA that enables optimization of individual NASBA components. We then present design rules for NASBA primer sets and LbuCas13a guide RNAs for fast and sensitive detection of SARS-CoV-2 viral RNA fragments, resulting in 20 – 200 aM sensitivity without any specialized equipment. Finally, we explore the combination of high-throughput assay condition screening with mechanistic ordinary differential equation modeling of the reaction scheme to gain a deeper understanding of the NASBA-Cas13a system. This work presents a framework for developing a mechanistic understanding of reaction performance and optimization that uses both experiments and modeling, which we anticipate will be useful in developing future nucleic acid detection technologies.

Supplementary Fig. 3 | Sequential NASBA reveals various off-target products.a, Sequential NASBA using RTs tested in Fig. 1 generates different off-target products at different steps of the reaction (Sequential NASBA in Materials and Methods).Testing the effect of RNase H on offtarget products.Reactions were performed sequentially with 0.5 U/µL AMV RT and initiated either by b, SARS-CoV-2 input RNA fragment targeted by primer set 5 or by c, CMV input RNA fragment.Data in a-c are a representative of n=3 independent biological replicates.Uncropped, unprocessed gel images in a-c are available as Supplementary Data 2. Supplementary Fig. 4 | Optimization of in-house NASBA-Cas13a.DMSO improves NASBA efficiency.Fluorescence kinetics from NASBA-Cas13a (primer set 8 -gRNA 1) initiated by 0, 5 or 50 fM synthetic SARS-CoV-2 genome in the presence of varying concentrations of DMSO: a, 0%, b, 5%, c, 10% and d, 15%.Adding fresh DTT and BSA with 15% DMSO increases the final fluorescence magnitude.Fluorescence kinetics from NASBA-Cas13a (primer set 8 -gRNA 1) initiated with 0, 0.2, 2 or 20 fM synthetic SARS-CoV-2 genome e, without fresh DTT or BSA, f, with 5 mM fresh DTT, g, with 0.1 µg/µL BSA or h, with 5 mM fresh DTT and 0.1 µg/µL BSA.Data shown are for n=3 independent biological replicates, each plotted as a line with raw fluorescence standardized to MEF (µM fluorescein).Shading indicates the average of the replicates ± standard deviation.Sequences of primers and gRNAs are listed in Supplementary Data 1.Supplementary Fig. 5 | Screening of guide RNAs and activator RNAs.Fluorescence kinetics for NASBA-Cas13a with varying concentrations of synthetic SARS-CoV-2 genome using a, gRNA 2-1, b, gRNA 2-2, c, gRNA 7-4 or d, gRNA 3-1.A predicted secondary structure of each gRNA including the constant region and the spacer sequence is depicted.The spacer sequence in gRNA 2-2 and gRNA 7-4 overlaps with the corresponding NASBA primer binding site (red).gRNA 3-1 is not predicted to form the necessary hairpin structure (red) required for complexing with LbuCas13a, which potentially contributes to the observed low cleavage efficiency.e, A predicted secondary structure of activator RNA 6.The region targeted by each gRNA is shaded.Fluorescence kinetics for NASBA-Cas13a with varying concentrations of synthetic SARS-CoV-2 genome using f, gRNA 6-1, g, gRNA 6-2 or h, gRNA 6-3 with predicted secondary structure of each gRNA above the kinetic traces.Lines on gRNA 2-1 (a) and gRNA 6-2 (g) indicate additional predicted long-range interactions.Data shown are for n=3 independent biological replicates, each plotted as a line with raw fluorescence standardized to MEF (µM fluorescein).Shading indicates the average of the replicates ± standard deviation.Supplementary Fig. 6 | Experimental conditions before and after data processing.a, Full experimental data set before pre-processing.The conditions shown here were run with both low (2.25 nM) and high Cas13a-gRNA (45 nM).Numbers for T7 RNAP, RNase H and RT are in units of U/µL.b, The model did not aim to describe low Cas13a-gRNA conditions or conditions with 0 input RNA, so these conditions were withheld.An additional condition (dark gray) was withheld from the training data for data set 1 due to high measurement error, but this condition was used in the training data for data sets 2 and 3 (Supplementary Fig. 9).For each data set, the out of sample data was defined as the conditions after pre-processing that were not in the training data.Supplementary Fig. 9 | Pre-processing of training data to remove conditions with relatively high measurement error.a, The mean proportion error, p, was calculated for each condition, j in data set 1 (Materials and Methods).b, For the condition in data set 1 with pj > 0.3 within the subset of conditions before pre-processing (Materials and Methods), normalized readout trajectories for each replicate are shown (gray), along with the normalized mean readout trajectory across all 3 replicates (blue).This condition had one replicate for which the readout remained near zero throughout the time course, in contrast to the other replicates in the condition.c-d, For data sets 2 and 3, the mean proportion error, p, was calculated for each condition, j.The condition with pj > 0.3 in each data set was not in the subset of conditions before pre-processing (Materials and Methods), so these conditions were not analyzed further or removed from training data.In each case, nsearch = 5000 and ninit = 24.Note that the PEM evaluation criterion is not exactly met for model C, Data Set 1, PEM evaluation Data Set 3; model C, Data Set 3, PEM evaluation Data Set 2; and model D, Data Set 3, PEM evaluation Data Set 1.In each case, the PEM evaluation data set is just under (<0.005) the PEM evaluation criterion.This is a minor inconsistency that is unlikely to affect downstream parameter estimation results.Supplementary Fig. 12 | Calibration and analysis of sub-optimal candidate model A for Data set 1. a-c, Time course trajectories for data subsets with simulated data generated with a, midrange RNase H and T7 RNAP and high Cas13a-gRNA, b, mid-range RNase H and RT and high Cas13a-gRNA and c, mid-range T7 RNAP and RT and high Cas13a-gRNA.across all conditions simulated in the data set (experimental column) or all conditions in the training data set (simulation column) were calculated for: b, t1/2 and c, Fmax.d-f, Time course trajectories for data subsets: d, mid-range RNase H and T7 RNAP and high Cas13a-gRNA, (the 0.5 U/µL condition was omitted as it was removed from the training data due to high measurement error (Supplementary Fig. 9) e, mid-range RNase H and RT and high Cas13a-gRNA and f, midrange T7 RNAP and RT and high Cas13a-gRNA.g, Parity plot for the correlation between normalized experimental data and normalized simulated data.Each point in the plot represents a combination of enzyme doses and time point.In a scenario of perfect agreement, all points would be on the y = x line (gray).

Supplementary
training data set (simulation column) were calculated for: b, t1/2 and c, Fmax.d-f, Time course trajectories for data subsets: d, mid-range RNase H and T7 RNAP and high Cas13a-gRNA e, mid-range RNase H and RT and high Cas13a-gRNA and f, mid-range T7 RNAP and RT and high Cas13a-gRNA.g, Parity plot for the correlation between normalized experimental data and normalized simulated data.Each point in the plot represents a combination of enzyme doses and time point.In a scenario of perfect agreement, all points would be on the y = x line (gray).
training data set (simulation column) were calculated for: b, t1/2 and c, Fmax.d-f, Time course trajectories for data subsets: d, mid-range RNase H and T7 RNAP and high Cas13a-gRNA, e, mid-range RNase H and RT and high Cas13a-gRNA and f, mid-range T7 RNAP and RT and high Cas13a-gRNA.g, Parity plot for the correlation between normalized experimental data and normalized simulated data.Each point in the plot represents a combination of enzyme doses and time point.In a scenario of perfect agreement, all points would be on the y = x line (gray).Supplementary Fig. 27 | Parameter sensitivity analysis using MSE as a metric.a-c, percent change in MSE when increasing each parameter individually by 10% (left) or decreasing each parameter individually by 10% (right), relative to the MSE for the calibrated parameter set, for Data Sets 1-3, respectively.

Cas13 deactivation
An exponential decay function was used to define deactivation of Cas13a indiscriminate ssRNase activity over time.At each time (), the fraction of active Cas13a ( 56789 ) was calculated using the Python package SciPy's stats module 1 expon function: where   is the total target-activated Cas13a:

Non-monotonic RNase H activity
A beta distribution was used to define  BCD for each initial condition of RNase H,   , , where  HIJ and  HIJ are shape parameters for the beta distribution and  HIJ determines the scaling on the distribution: RNase H initial conditions were scaled such that each condition was between 0 and 1, which are the bounds of the beta distribution.

Re-scaling
To avoid numerical instability, we re-scaled xinput RNA such that: With  4(3:& = 10 X .We show separate derivations for the ODE describing    and all other ODEs.For the ODE that tracks xinput RNA, we used the following derivation for rescaling: where   is all state variables that are not    .Substituting for  ] , the equation is: After simplification, the equation becomes: Therefore, no changes are required for the ODE that tracks xinput RNA.For each remaining term involving xinput RNA, we used the following equation for re-scaling: After substituting for   , the equation becomes: Therefore, for each term including    R (in all equations except for the one that tracks    ), the term is divided by  4(3:& .Re-scaling is accounted for in Supplementary Table 4.

Conservation laws
Conservation laws were applied to internal model states that by definition are conserved.At each time step, the following equations were used to calculate concentrations.In each equation, the first term after the equals sign is the initial value.Algorithm for PEM evaluation Before using the PEM to estimate parameters to describe the training data, we evaluated the ability of the proposed PEM to find parameter sets yielding good agreement with the training data.
To accomplish this, we generated three PEM evaluation data sets using three different sets of PEM evaluation parameters (Supplementary Fig. 10b).Each PEM evaluation data set has the same structure as the training data, and each data set was generated using the proposed model in question.This process was repeated for each new version of the model (A, B, C, and D).After generation of the PEM evaluation data, parameters were fit to the data, and the ability of the PEM to identify parameter sets with good agreement with the training data was evaluated by calculating the PEM evaluation criterion.The PEM evaluation criterion was satisfied for each model, for each data set (Supplementary Fig. 11a-l), which supports that the proposed PEM is appropriate for each model given the structure of the training data.PEM evaluation is described further in the initial report of the GAMES workflow 2 .

Generation of PEM evaluation training data
It is important to accurately represent measurement error when evaluating a parameter estimation method for a given set of training data.For each PEM evaluation data set, we first simulated each component condition (defined as the raw simulated data), then added noise to approximate the experimental measurement error.For each condition, we first calculated the standard error,  !"# , associated with each data point, i, using the following equation: $  '()*%+,-(.

(Equation S19)
!& is the standard deviation of each data point, all of which were collected in triplicate ( fLghij6kL7 = 3).
Next, for each condition, j, we calculated the mean,  l , and standard deviation,  !&,/ , of  !"# across all data points for each condition. l and  !&,/ were then used to define a normal distribution from which values were randomly generated and added or subtracted to each data point in the raw simulated data.If subtracting a noise value led a data point that was less than 0, another random number was generated until the data point with noise was greater than or equal to 0.
To determine whether the noise value was added or subtracted from the raw data point, we used a random number generator that output a value of either 0 or 1.If the value was 0, the noise value was subtracted from the raw value and if the value was 1, the noise value was added to the raw value.
We found that, for all conditions, measurement error was near zero for the first 10 time points, so we did not include these data points in the mean measurement error calculations.We also did not add noise to these data points.

Determination of PEM evaluation criterion
For each PEM evaluation data set, we calculated the R 2 value between the raw simulated data and the simulated data with added noise.The minimum R 2 value across the 3 PEM evaluation data sets was used to define the PEM evaluation criterion.

Parameter estimation method details
For all models, parameter estimation simulations were initiated with the literature values in Supplementary Table 2 and allowed to vary 3 orders of magnitude in each direction.We used the literature values for the free parameters ( m3489 ,  <&n_* ,  !"# 30.2 ,  pqq ,  BCD ) only as order of magnitude estimates, as these values were all determined with different systems and under different conditions than our training data.In initial simulations, we found that large values of kCas13 led to stiff dynamics that stalled out the ODE solver, even when an algorithm appropriate for stiff ODEs was used, so we initialized kCas13 at a value 2 orders of magnitude below the literature estimate.Literature estimates for  012,345267856719 and  :2504,345267856719 were unavailable.These parameters were each allowed to vary between 0 and 240.Bounds for the beta distribution shape parameters,  ;<= and  ;<= , were based on visual inspection of the beta distribution to enable non-monotonic behavior in the regime of the scaled [ ] > values. ;<= was allowed to vary between 1 and 10 and  ;<= was allowed to vary between 1 and 100.Because  ;<= acted as a scaling parameter and did not change the shape of the beta distribution, it was allowed to vary between 10 -3 and 10 3 .
For Models A, C, and D, we found it necessary to implement a timeout function in the global search such that if a parameter set takes too long to solve (in our case, we used t = 100s as the timeout condition), it was skipped, and the cost function was set to an arbitrarily high value of 3. Without the timeout function, the global search simulation stalled indefinitely, making the subsequent optimization step impossible.Our timeout function was incompatible with our parallelization code, so Models A and C were run without parallelization.Model B did not require the timeout function and was run with parallelization.We hypothesize that the stalling phenomenon may occur for only some model structures and parameter sets because the ODE solver may be forced to take extremely small time steps in some parameter regimes.Skipping these parameter sets in the global search is appropriate because only a few parameter sets were thrown out, therefore not significantly affecting the downstream results.
We note that this timeout function was an extremely important part of our modeling work.Without the timeout function, we were forced to rely on manual parameter estimation, as our parameter estimation workflow could not be run from start-to-finish without stalling out.This non-rigorous, manual parameter estimation method initially led us to erroneously conclude that Model A did not meet any of the modeling objectives and we subsequently added in an additional mechanism to slow down the reactions.After adding in the timeout function and running our full parameter estimation workflow, we concluded that this additional mechanism was in fact not necessary.This anecdote highlights the importance of using an appropriately rigorous parameter estimation workflow, even when extra effort is required to enable the workflow to run (in our case, the timeout function).

Further analysis of sigmoidal behavior of the fits to data sets 2 and 3 for models A and B
The calibrated parameters for models A and B for data sets 2 and 3 did not result in a visually sigmoidal behavior (Supplementary Figs.13-14 for model A and Supplementary Figs.16-17 for model B), although the Hill-fit metrics did enable each one to satisfy modeling Objectives 1 and 2 (Table 1).In each case, we performed further analysis to determine whether we could achieve a more visually sigmoidal behavior.For each model and data set, we initialized the PEM optimization (Materials and Methods) with the corresponding calibrated parameter values for the fit to data set 1, which did exhibit a visually sigmoidal behavior for models A and B (Supplementary Figs. 12 and 15, respectively).We found that this approach did not result in a visually sigmoidal behavior for either model or data set, which indicates that it was not possible to achieve a visually sigmoidal behavior for the fits to data sets 2 and 3 with these mechanisms alone.

Supplementary Note 4 | Parameter identifiability
One limitation of the model is that not all parameters are identifiable, i.e., capable of being uniquely estimated within a finite confidence interval given the training data.Therefore, if the model were used to make predictions, such as of an optimal enzyme concentration or predicting the impact of a strategic intervention designed to decrease the time to readout, these predictions might also not be constrained and could be difficult to validate.For any case in which the model is used to make predictions, parameter identifiability analysis could guide model reduction or experimental design with the goal of arriving at a fully identifiable model with well-constrained predictions 3 .However, as the path to an identifiable model often changes depending on the type of prediction that is desired, such an analysis is beyond our current scope, as we focus on the insight gained from the explanatory model.

Fig. 10 |
Parameter estimation method workflow and evaluation.a, Parameters were estimated using a multi-start optimization strategy.First, a Latin hypercube global search with nsearch parameter sets was used to sample the parameter space.The top ninit parameter sets (with the lowest cost function values) were each used to initialize independent optimization runs using the Levenberg-Marquardt algorithm.The optimized parameter set with the lowest cost function was defined as the calibrated parameter set.b, The parameter estimation method (PEM) was evaluated for each model.First, a Latin hypercube global search was performed, and the results were filtered based on fit to the training data.The top nPEM eval parameter sets were used to generate PEM evaluation training data by using each parameter set to simulate the training data and adding noise to each data point.Next, parameters were estimated using each PEM evaluation data set, and the results were analyzed.Supplementary Fig. 11 | Parameter estimation evaluation results.a-c, PEM evaluation results for model A for Data Sets 1, 2, and 3, respectively.d-f, PEM evaluation results for model B for Data Sets 1, 2, and 3, respectively.g-i, PEM evaluation results for model C for Data Sets 1, 2, and 3, respectively.j-l, PEM evaluation results for model D for Data Sets 1, 2, and 3, respectively.